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One of the key interests in the social sciences is the investigation of change and stability 
of a given attribute. Although numerous models have been proposed in the past for 
analyzing longitudinal data including multilevel and/or latent variable modeling approaches, 
only few modeling approaches have been developed for studying the construct validity 
in longitudinal multitrait-multimethod (MTMM) measurement designs. The aim of the 
present study was to extend the spectrum of current longitudinal modeling approaches 
for MTMM analysis. Specifically, a new longitudinal multilevel CFA-MTMM model for 
measurement designs with structurally different and interchangeable methods (called 
Latent-State-Combination-Of-Methods model, LS-COM) is presented. Interchangeable 
methods are methods that are randomly sampled from a set of equivalent methods (e.g., 
multiple student ratings for teaching quality), whereas structurally different methods are 
methods that cannot be easily replaced by one another (e.g., teacher, self-ratings, principle 
ratings). Results of a simulation study indicate that the parameters and standard errors in 
the LS-COM model are well recovered even in conditions with only five observations per 
estimated model parameter. The advantages and limitations of the LS-COM model relative 
to other longitudinal MTMM modeling approaches are discussed. 
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1. INTRODUCTION 

An increasing body of research is devoted to longitudinal data 
analysis examining the change and stability of a given attribute 
across time (see Singer and Willett, 2003; Khoo et al., 2006). The 
prominence of longitudinal studies may be explained by the fact 
that longitudinal measurement designs bear many advantages. 
Longitudinal measurement designs are more informative than 
cross-sectional studies, allowing researchers to (1) investigate 
change and/or variability processes, (2) test the degree of mea- 
surement invariance as well as indicator-specific effects, and (3) 
examine potential causal relationships (see Steyer, 1988, 2005). 
Over the last decades, many statistical models have been proposed 
for analyzing longitudinal data including multilevel as well as 
latent variable modeling approaches (c.f. Little et al, 2000; Singer 
and Willett, 2003; Rabe-Hesketh and Skrondal, 2004; Steele et al., 
2008; Heck et al., 2013). On the other hand, only few attempts 
have been made to develop appropriate models for longitudinal 
multitrait-multimethod (MTMM) data (e.g., Kenny and Zautra, 
2001; Burns and Haynes, 2006; Courvoisier et al., 2008; Grimm 
et al., 2009; Geiser et al., 2010; Koch, 2013). 

Originally, multitrait-multimethod (MTMM) analysis was 
developed for scrutinizing the construct validity of social 
science measures (Campbell and Fiske, 1959). According to 
Campbell and Fiske (1959) at least two traits (e.g., empa- 
thy and aggression) and two methods (e.g., student reports 
and teacher reports) are required for investigating the degree 



of convergent and discriminant validity among different mea- 
sures. Convergent validity refers to the associations (correlations) 
between two methods measuring the same trait (e.g., the correla- 
tion between empathy measured via student and teacher reports). 
Discriminant validity refers to the question of whether and to 
which extent methods are able to differentiate between differ- 
ent traits (e.g., the correlation between self-reported empathy and 
self-reported aggression). 

Combining the advantages of longitudinal modeling 
approaches and MTMM modeling approaches can be fruit- 
ful. For example, longitudinal MTMM models allow researchers 
to investigate the construct validity of different measures across 
time by combining the information provided by multiple 
methods or reporters in a single model. This is useful because 
a researcher would otherwise have to estimate separate longi- 
tudinal models for each reporter and no information as to the 
relationship between reporters could be obtained. Moreover, 
longitudinal MTMM models allow modeling method effects, 
examining the stability and change of these method effects across 
time, and scrutinizing potential causes of method effects by 
including other (manifest of latent) variables in the model. 

The purpose of the present work is to extend the range of lon- 
gitudinal models for the analysis of complex longitudinal MTMM 
data by presenting a comprehensive modeling framework for dif- 
ferent types of methods. Specifically, we present a new multilevel 
structural equation model for the analysis of longitudinal MTMM 
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data featuring interchangeable and structurally different meth- 
ods. The model is called Latent-State-Combination-Of-Methods 
model (LS-COM) model. The LS-COM model combines the 
advantages of four modeling approaches, that is, structural equa- 
tion modeling, multilevel modeling, longitudinal modeling, and 
MTMM modeling with interchangeable and structurally differ- 
ent methods. In particular, the LS-COM allows researchers to 
(1) explicitly model measurement error, (2) specify method fac- 
tors on different measurement levels, (3) analyze the convergent 
and discriminant validity across multiple occasions, (4) investi- 
gate change and stability of construct and methods effects across 
time, and (5) test important assumptions in longitudinal data 
analysis such as the degree of measurement invariance. The LS- 
COM model is formulated based on the principles of stochastic 
measurement theory (Zimmerman, 1975; Steyer and Eid, 2001), 
which has the advantage that all latent variables in the model are 
psychometrically well-defined as random variables with a clear 
meaning. 

The article is structured as follows: First, we review con- 
ventional (single-method) models of longitudinal confirmatory 
factor analysis with a special focus on latent state (LS) mod- 
els (Steyer et al., 1992). Second, we discuss current extensions 
of LS-modeling approaches to MTMM designs with structurally 
different methods. In this regard, we review the correlated state- 
correlated method minus one [CS-C(M-l)] model by Geiser 
et al. (2010). Furthermore, we explain the differences between 
measurement designs with structurally different methods, inter- 
changeable methods, or a combination of both methods. We show 
that the CS-C(M-l) model is useful for modeling data obtained 
from longitudinal MTMM measurement designs with struc- 
turally different methods, but that this model is not suitable for 
measurement designs with a combination of structurally different 
and interchangeable methods. Third, we present the new LS- 
COM model for longitudinal MTMM designs with structurally 
different and interchangeable methods. The new LS-COM model 
fills a gap in the literature, as previous approaches to longitudi- 
nal MTMM analysis focused exclusively on structurally different 
methods. Fourth, we report the results of a Monte Carlo simula- 
tions study in which we examined the statistical performance of 
the LS-COM model. Finally, we discuss the advantages and lim- 
itations of the LS-COM model compared to other longitudinal 
MTMM modeling approaches. 

2. LONGITUDINAL CONFIRMATORY FACTOR ANALYSIS 

The versatility and flexibility of the CFA framework have inspired 
the development of different CFA models for longitudinal mea- 
surement designs, for example, autoregressive models (Hertzog 
and Nesselroade, 1987; Joreskog, 1979a,b; Marsh, 1993; Eid and 
Hoffmann, 1998), latent state models (Steyer et al, 1992), latent 
change (difference score) models (Steyer et al., 1997, 2000; 
McArdle, 1988), latent state-trait models (Steyer et al, 1992, 
1999), and latent growth curve models (McArdle and Epstein, 
1987; Meredith and Tisak, 1990; Hancock et al., 2001; Bollen 
and Curran, 2006; Duncan et al., 2006). Most previous longitu- 
dinal models have been designed for single method measurement 
designs (e.g., self-reports) only. Presumably, the simplest CFA 
model for longitudinal data is the latent state (LS) model, which 



represents an extension of classical test theory to longitudinal 
measurement designs (see Steyer et al., 1992; Marsh, 1993; Tisak 
and Tisak, 2000; Geiser, 2009). The LS model is often used as a 
baseline model, given that it implies no restrictions with regard 
to the structural part of the model (see Figure 1). Hence, the LS 
model is often used for testing the measurement model (e.g., the 
validity of the assumed factor structure, measurement invariance 
restrictions, correlations of error variables, unidimensionality of 
the scales on an occasion of measurement). According to latent 
state theory (see Steyer et al, 1992), each observed variable Y,i 
can be decomposed into a latent state (S, ;, occasion-specific true 
score) variable and a measurement error variable en, where 
is the indicator (item or parcel) and / denotes the occasion of 
measurement: 

Yu^Su + en. (1) 

The latent state variable Sn represents the individual state scores 
at a particular occasion of measurement, whereas the measure- 
ment error variables reflect unsystematic influences due to mea- 
surement error. It can be shown that the additive decomposition 
of the observed variables Yn into a latent state variable Sn and 
a latent measurement error variable en follow directly, if both 
latent variables are defined in terms of conditional expectations 
(see Steyer, 1988, 1989; Steyer et al, 1992). In order to estimate 
a latent state model, it is assumed that (1) the latent state vari- 
ables belonging to the same occasion of measurement are linear 
functions of each other (i.e., congenerity assumption): 



Si l = oc,,'l + Mi'lSi'h 



(2) 



and that (2) the measurement error variables [i.e., Cov(e,;, en) 
for (i, Z) (i' , /')] are uncorrelated with each other. Equation (2) 
states that the latent state variables are linear functions of each 
other and only differ by an additive constant agi and multiplica- 
tive constant kg;. With respect to this assumption, it is possible to 
show that Equation (2) is equivalent to S;; = a,; + X;/S/. Hence, 
the general measurement equation of a latent state model with 
common latent state factors can be written as follows: 



Yil = <Xil + kiiSi + €ii. 



(3) 



a, i is the intercept and X, \ is the factor loading parameter pertain- 
ing to the latent state factors. As a consequence of the assumptions 
explained above, the total variance of the observed variables can 
be decomposed as follows: 

Var(Y n ) = xj,Var(S,) + Var{e n ). (4) 

The reliability of each observed variable is then given by: 



Rel(Yu) 



Var(Yn) 



(5) 



Figure 1 shows a path diagram of the latent state model for three 
indicators and three occasions. 

The correlations between the latent state factors S; characterize 
the stability of interindividual differences on the given attribute 
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FIGURE 1 I A latent state (LS) model with three indicators (/= 3) and 
three occasions of measurement (/= 3), where S;: latent state factors, 
X,/: latent state factor loadings, e//: measurement error variables. 

Intercepts (a,;) are not depicted in the figure. 



(see Figure 1). High correlations reflect that individual differ- 
ences with regard to a particular attribute (construct) are rather 
stable over time. Researchers may also investigate mean change of 
a given construct across time. For meaningful interpretations of 
latent mean change, we recommend that measurement invariance 
(MI) should be tested and that researchers should at least estab- 
lish strong MI (e.g., Meredith, 1993; Widaman and Reise, 1997; 
Millsap, 2012). 

Strong MI can be established by imposing the following 
restrictions: 

1 . The intercepts of the observed variables a, ; have to be set equal 
across time (i.e., an = ant = a,). 

2. The factor loading parameters A.,-; have to be set equal across 
time (i.e., A.;/ = A.;// = A;) and one factor loading parameter 
on each occasion of measurement has to be fixed to the same 
value (e.g., Xj = 1). 

3. The mean of the first latent state factor has to set to be zero 
[i.e.,£(Si) = 0]. 

4. The mean of the remaining latent state factors can be freely 
estimated [i.e., £(S;) ^ 0]. 

Strong MI is a prerequisite for studying true mean change 
(Steyer et al, 1997, 2000) \ Restrictions 3 and 4 allow 



examining mean change relative to the first measurement 
occasion 2 . Although the LS model (as well as other longitudinal 
CFA models) offers many advantages such as analyzing change 
and stability of an attribute apart from measurement error influ- 
ences and testing the degree of measurement invariance, the LS 
model is limited in terms of incorporating data from multiple 
raters or methods, because the model does not contain method 
factors. In order to study the convergent and discriminant valid- 
ity in longitudinal MTMM designs, more sophisticated models 
are needed. 

3. LONGITUDINAL CFA-MTMM MODELS 

According to Eid and Diener (2006) multimethod measure- 
ment designs overcome many limitations of single method 
measurement designs and should therefore be preferred when- 
ever possible. With respect to longitudinal CFA-MTMM models 
it is possible to (1) investigate the convergent and discriminant 
validity at each occasion of measurement and across different 
occasions of measurement, (2) study change and stability of con- 
struct and method effects across time, (3) model measurement 
error, (4) investigate the generalizability of method effects, and 
(5) test important assumptions such as measurement invariance 
and/or indicator-specific effects. 

Today, MTMM measurement designs are commonly analyzed 
using confirmatory factor analysis (CFA-MTMM models) with 
multiple indicators in each trait-method unit (e.g., Widaman, 
1985; Marsh and Hocevar, 1988; Marsh, 1989; Wothke, 1995; 
Dumenci, 2000; Eid, 2000; Eid et al., 2003, 2006). Up to now, 
only few CFA-MTMM models have been proposed for the analy- 
sis of longitudinal data (e.g., Kenny and Zautra, 2001; Burns and 
Haynes, 2006; Courvoisier et al., 2008; Grimm et al, 2009; Geiser 
etal, 2010; Koch, 2013). 

One exception is the study by Grimm et al. (2009) who recently 
proposed a longitudinal CFA-MTMM model combining the cor- 
related trait-correlated method (CT-CM) approach (Widaman, 
1985; Marsh and Grayson, 1995) and the latent growth curve 
modeling approach (e.g., McArdle and Epstein, 1987; Meredith 
and Tisak, 1990). However, results of previous studies have shown 
that the CT-CM modeling approach is associated with various 
theoretical and empirical problems (e.g., Marsh, 1989; Kenny 
and Kashy, 1992; Marsh and Grayson, 1994; Steyer, 1995; Eid, 
2000; Geiser et al, in press). In addition, the CFA-MTMM model 
by Grimm et al. (2009) is limited to single-indicator measure- 
ment designs and does not allow specifying trait-specific method 
factors. 

Geiser et al. (2010) developed a longitudinal CFA-MTMM 
model [called correlated state-correlated method minus one, CS- 
C(M-l) model] that combines LS theory with the correlated 
trait-correlated method minus one [CT-C(M-l)] approach (Eid, 
2000; Eid et al., 2003, see Figure 2). In this model, one method 
has to be chosen as reference method which all other meth- 
ods are compared to. The common latent state factor is the 
state factor of the reference method. Each observed variable of a 



^or more details on partial (MI) see Byrne et al. (1989) and on approximate 
measurement invariance see Van De Schoot et al. (2013). 



2 Another possibility is to set the intercept of a reference indicator (e.g., first 
indicator) to zero on all occasions of measurement and estimate the latent 
means of the latent state factor on each occasion of measurement. 
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FIGURE 2 I A latent correlated state correlated method minus one 
[CS-(M— 1)] model with three indicators (/= 3), one construct (/'= 1), 
two methods (/< = 2) and three occasions of measurement (/= 3), 
where V,y W : observed variables, Sjki- latent state factors, Xsijki- state 
factor loadings. My/,/: latent method factors, XMijki'- method factor 
loadings, e,y W : measurement error variables. Intercepts («»/) are not 
depicted in the figure. 



non-reference method is decomposed into three parts: (1) a part 
that is predictable by the common state factor, (2) a part that is 
method-specific, and (3) measurement error. One advantage of 
the CS-C(M-l) model is that all latent variables are well-defined 
with a clear and unambiguous interpretation (Geiser, 2009). The 
CS-C(M-l) model also overcomes many limitations of previous 
CFA-MTMM modeling approaches. For example, the CS-C(M- 
1) model allows specifying trait-specific method factors using 
multiple indicators per trait-method unit (TMU) and separating 
the observed variance into trait, method, and measurement error 
variance. According to the results of simulation studies (Crayen, 
2008; Geiser, 2009), the CS-C(M-l) model performs well in many 
conditions. 

However, the CS-C(M-l) model cannot be applied to all 
possible longitudinal MTMM measurement designs. In par- 
ticular, the CS-C(M-l) model is not suitable for MTMM 
measurement designs combining structurally different and inter- 
changeable methods. In the next section, the differences between 
measurement designs with structurally different and interchange- 
able methods are explained in greater detail. 



4. DIFFERENT TYPES OF METHODS 

Eid et al. (2008) clarified that the type of method used in a study 
is of particular importance for defining appropriate CFA-MTMM 
models. More specifically, Eid et al. (2008) showed that measure- 
ment designs with (a) interchangeable methods, (b) structurally 
different methods, and (c) a combination of structurally dif- 
ferent and interchangeable methods imply different sampling 
procedures and therefore require different CFA-MTMM models. 
According to Eid et al. (2008), interchangeable methods are meth- 
ods that can be randomly sampled from a set of similar methods. 
Consider, for example, multiple peer ratings of students' empathy. 
Both, peer ratings and subordinates' ratings can be considered as 
interchangeable, because they have more or less the same access 
to the target's behavior (Eid et al, 2008). Figure 3B illustrates the 
sampling procedure for interchangeable methods. According to 
Figure 3B, measurement designs with interchangeable methods 
imply a multistage sampling procedure (Eid et al, 2008; Koch 
et al., in press). First, a target (t, e.g., teacher) is randomly chosen 
from a set of all possible targets (f e T, i.e., all teachers). Second, 
multiple (e.g., three) students (e.g., Edgar, Emily, and Mark) are 
randomly sampled from the same target-specific rater set _R t . 
Therefore, measurement designs with interchangeable methods 
imply a multilevel data structure (Eid et al, 2008). 

In contrast, measurement designs with structurally different 
methods (see Figure 3A) use methods that are not randomly 
sampled out of a common set of similar methods (raters). For 
example, structurally different methods such as self-ratings, par- 
ent ratings, and the ratings of the class teacher do not stem 
from the same group of methods, but differ in many ways. As 
a consequence, measurement designs with structurally different 
methods can usually be modeled with single-level factor mod- 
els [e.g., CS-(M-l) model]. An increasing number of studies 
use a combination of structurally different and interchange- 
able methods. For example, in organizational psychology it is 
very common to use self-reports, supervisor reports, and inter- 
changeable colleague reports (so-called 360° feedback designs). 
In educational and developmental psychology, many researchers 
use student reports, teacher and parent reports, as well as inter- 
changeable peer reports. All of these designs imply a combination 
of structurally different and interchangeable methods. 

5. THE NEED FOR LONGITUDINAL MULTILEVEL CFA-MTMM 
MODELS 

So far, no appropriate CFA model has been proposed for 
longitudinal MTMM data combining structurally different and 
interchangeable methods. Researchers who use such MTMM 
measurement designs (e.g., longitudinal multisource feedback 
designs with different types of raters) are therefore forced to either 
aggregate the interchangeable ratings into a single score or analyze 
both types of methods (structurally different and interchangeable 
methods) in separate models. The aggregation of level one units 
(here interchangeable methods) has been associated with various 
methodological shortcomings, such as, interpretation problems 
(e.g., ecological fallacy), loss of information, smaller sample size, 
larger standard errors, and loss of power (Hox, 2010; Snijders 
and Bosker, 2011). If both types of methods are analyzed sep- 
arately, then researchers are not able to integrate (or compare) 
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FIGURE 3 | Sampling procedure for different types of methods. Panel (A) 
refers to the sampling procedure of measurement designs with structurally 
different methods. Panel B refers to the sampling procedure of measurement 
designs with interchangeable methods. The big gray filled circles are the sets 
of possible targets, the small gray filled circles are the sets of possible 



methods (raters). The black filled circles represent possible observation units 
belonging to the set of targets or raters. Structurally different (fixed) methods 
are indicated by straight lines connecting raters and targets directly. 
Interchangeable (random) methods are illustrated by straight lines connecting 
a target with a particular set of possible raters by a black line. 



the information of both types of methods (rater groups) in the 
same model. For example, convergent validity of interchange- 
able peer reports and self-reports could be examined. Given that 
many researchers increasingly apply measurement designs with a 
combination of structurally different and interchangeable meth- 
ods, there is a need for developing new methods for the analysis 
of such complex MTMM measurement designs. In the next sec- 
tion, we present the LS-COM model, which integrates LS theory 
and the CS-C(M-l) modeling approach for a combination of 
different types of methods. In addition, we present the results 
of a Monte Carlo simulation study, in which we examined the 
statistical performance of the LS-COM model under different 
conditions. 

6. THE LATENT STATE COMBINATION OF METHODS 
(LS-COM) MODEL 

The LS-COM allows researchers 

1. to scrutinize the degree of measurement invariance across 
time, 

2. to test mean changes of particular constructs, 

3. to examine the stability and change of construct and method 
effects across time, 

4. to investigate the psychometric properties (e.g., the convergent 
and discriminant validity and reliability) of the given mea- 
sures on each occasion of measurement and across occasions 
of measurement, and 

5. to scrutinize the generalizability of method effects across dif- 
ferent methods and/or different constructs. 

Similar to the CS-C(M-l) model, we define the LS-COM model 
in different steps. 

6.1. STEP 1: CHOICE OF REFERENCE METHOD AND BASIC 
DECOMPOSITION 

In the first step, a reference or gold-standard method has to be 
chosen. The remaining methods serve as non-reference methods. 



The reference method is often a method that is either seen as 
most valid by the researcher based on theory or prior empirical 
results or a method that is particularly outstanding or special 
relative to the other methods (e.g., objective IQ tests versus 
self-ratings of intelligence). In the LS-COM model either one of 
the structurally different or the set of interchangeable methods 
can be chosen as reference method. For the sake of simplicity, we 
define the LS-COM model for two structurally different methods 
(method 1 = self-report, method 2 = parent report) and one set 
of interchangeable methods (method 3 = multiple peer reports 
for a student). Note that the LS-COM model is not restricted in 
terms of the number of structurally different methods. Moreover, 
we chose the first method (a structurally different method, e.g., 
self-reports) as reference method and assume that there is only 
one parent report for each target. Pham et al. (2012) as well as 
Schultze (2012) show how the set of interchangeable methods 
can be chosen as reference method. The observed variables of 
each method can be decomposed into a latent state and a latent 
measurement error variable: 



Level 2: Ytijil = Stijii + ^tijih (structurally different method 1) 

(6) 

Level 2: Ytijil = Sgai + CtifiU (structurally different method 2) 

(7) 

Level 1: Y rt ipi = S r tipl + Gnifil- (set of interchangeable methods) 

(8) 

The index i represents the indicators, j is the construct, k is 
the method, and / is the occasion of measurement. In addition, 
the indices r for rater and t for target are required. The reason 
is that the interchangeable raters r are nested within different 
targets t . Hence, the observed variables of the self-reports and par- 
ent reports are measured on Level 2 (the target level), whereas 
the observed variables pertaining to the interchangeable meth- 
ods (peer reports) are measured on Level 1 (the rater level). 
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A value of the target-specific latent state variables S t ijki is the true 
score of target f with respect to indicator i, construct j, method 
k (i.e., self- report or parent report), and occasion of measure- 
ment 1. The rater-specific latent state variables S^m reflect the 
(method-specific) true peer rating of a rater r for a particular tar- 
get f on indicator i, construct j, and occasion of measurement 
/. The measurement error variables on both levels are repre- 
sented by etijn (Level 2) and e^ikl (Level 1). In the Appendix 
A in Supplementary Material, we show how the latent state and 
measurement error variables are formally defined in terms of 
conditional expectations. 

6.2. STEP 2: DEFINITION OF RATER-SPECIFIC LATENT METHOD 
VARIABLES ON LEVEL 1 

In the second step, rater-specific (Level 1) latent method variables 
are defined for the interchangeable methods (i.e., multiple 
peer reports). This is possible given that multiple peers r rate 
each target f on different items (indicators: i). Therefore, the 
rater-specific latent state variables can be decomposed into a 
rater-unspecific latent state S^pi variable and a rater-specific 
method UM rt pi variable. 

Yrtiftl = S r tiftl + € rt ipl (9) 
Srtifll = S tl ftl + UMrtjjil (10) 
Ymftl = S tl fti + UMrtipl + e r tiftl- (H) 

A value of the latent state variables S t ifti can be conceived as 
the expected peer rating of the target f across the true occasion- 
specific peer ratings for that target. That is, the latent state vari- 
ables Sdfti can be conceived as the average peer rating and are thus 
variables on Level 2. A value of the latent unique method vari- 
ables UMftipi i s m e true occasion-specific deviation of a particular 
rater from this true mean. Hence, a value of the UM rf y3/-variables 
represents the over- or underestimation of the true expected 
peer rating by a particular rater r. Positive values indicate an 
overestimation, whereas negative values indicate an underesti- 
mation of the true expected peer rating by a particular rater. 
Given that the unique method UM rf y3/-variables are defined as 
latent residual variables, the general properties of residual vari- 
ables hold. That means that the unique method UM rf y3/-variables 
are uncorrelated with the Level 2 latent state S t pi variables [i.e., 
Cor{S t iju, UM rt pi) = 0] and have an expectation (mean) of zero 
[i.e., E(UM rt fti) = 0]. Moreover, as in classical multilevel (struc- 
tural equation) models, it is assumed that the Level 1 residuals 
(here: the l/M rf y 3 / -variables) are independently and identically 
distributed on Level 1 (i.e., iid- assumption). 

6.3. STEP 3: LATENT REGRESSIONS AND DEFINITION OF LATENT 
METHOD VARIABLES ON LEVEL 2 

Given that all latent state variables Stm are now measured on 
the same level (Level 2; the target level), it is possible to contrast 
the latent state variables pertaining to different types of methods 
against each other. Following the original CT-C(M-l) approach 
for structurally different methods (Eid, 2000; Eid et al., 2003, 
2008), the latent state variables pertaining to the non- reference 



methods are regressed on the latent state variables pertaining to 
the reference method (in this example self-reports): 

E{S t ij2i\Stijii) = <Xij2l + ^Siftl SftjU (parent reports) (12) 
E{Stifti\S ti ju) = dpi + X Sl fti Sfiju. (peer reports) (13) 

The (independent) latent state variable S fy i; in the latent regres- 
sion analysis denotes the occasion-specific true score measured 
by the reference method (e.g., self- reports). The residuals of 
the latent regression analyses are defined as latent method vari- 
ables. These method variables are also measured on the target 
level (Level 2). With regard to the structurally different non- 
reference method (e.g., parent reports), the method variables can 
be defined as follows: 

M-tiju = Stip.1 - E(S t ij2i\Stiju) = Snfti — (uifti + ^Sifti Sfyi/). 

(14) 

The method variables M f y 2 ; represent that part of the true par- 
ent reports that cannot be predicted by the self-reports. In other 
words, these method variables capture the occasion-specific part 
of the parent report that cannot be predicted by the self-report. 
As consequence of the definition of the M t y2/-variables as resid- 
ual variables the latent method variables are uncorrelated with 
the latent state variables [i.e., Cor(S t iju, Mmf) = 0] and have an 
expectation (mean) of zero [i.e., EiMmf) = 0]. For the set of 
interchangeable methods (e.g., peer reports), the method vari- 
ables can be defined as follows: 

CM t ifti = Sup} — E(S ti fti\S ti ju) = Sfipi — (api + XsiftiStijii) ■ 

(15) 

The method variables CM t ifti represent that part of the true 
expected peer ratings that is not shared with self-report on the 
same occasion of measurement. The common method variable 
is called common method variable, given that they represent 
the perspective of the peers that is shared by all peers, but is 
not shared with the self-reports on a particular occasion of 
measurement. By definition the latent common method variables 
are uncorrelated with the corresponding latent state variables of 
the reference method [i.e., Cor(S tl ju, CMpi) = 0] and have an 
expectation (mean) of zero [i.e., E(CM tji i) = 0]. Moreover, the 
following correlations are assumed to be zero in the LS-COM 
model: 



Cor(S m i, UMrfx) = 0, (16) 

Cor(CMtifti, UMrfu) = °. d 7 ) 

Cor(M m i, UMrtfsl') = 0, (18) 

Cor{€rt(ijki), ert(ijkl)>) = 0, (19) 

Cor(e t(l]k l), e t (,jkiy) = 0, (20) 

Cor(e rt (ijki), e r t(i'j'k'l')) = 0. (21) 



According to Equations (16-18), it is assumed that the Level 1 
unique method variables are uncorrelated with all variables on 
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Level 2 (i.e., latent state, latent common method, and latent 
method variables). Equations (19-21) imply that all measure- 
ment error variables belonging to different indicators, different 
constructs, different methods, and different occasions of mea- 
surement are uncorrelated with each other. 

6.4. STEP 4: DEFINITION OF LATENT METHOD FACTORS 

In order to define latent method factors, it is assumed that 
the latent method variables of the same method only differ by 
multiplicative constants (i.e., 'ku\]ih ^CMifth ^UMifil)- According 
to these assumptions, it is possible to define common latent 
method factors that are homogeneous across different indicators 
(i.e., M t]2h CMqsu UM rtfll ): 

Level 2: M ti j 2 i = X M ij2lM t j 2 i, (22) 

Level 2: CM,y 3; = kcMijilCM^i, (23) 

Level 1: UM rtlfl i = k UMlft iUM rt]3 i. (24) 

The above Equations (22-24) state that the method effects are 
now measured by latent method factors that are common to all 
indicators. 

6.5. STEP 5: DEFINITION OF LATENT STATE FACTORS 

Following a similar logic, it is possible to construe a latent state 
factor Smi that is common to all indicators: 

Stijll = Otjjll + ^SijllStjll- (25) 

Overall, the general measurement equation of the LS-COM 
model for three methods (e.g., k = 1 = self-report, k = 2 = 
parent report, k = 3 = peer reports) and latent state factors {Smi) 
can be expressed by: 



Ytiju = otiju + XsijuStju + e t iju, (26) 
YtijU = a ij2l + ^-sijuStjii + l-Mij2l M tj2l + e tij2h (27) 
Ynipl = Uipl + ^SiplStjll + ^CMij3lCM t pi + ^UMiplUM rt j3i + € n ifi\. 

(28) 

Equation (26) states that the reference method (e.g., self-report) 
indicators are only measured by a latent reference state fac- 
tor Stju with an intercept ami an d factor loading parameter 
^Sijil an d a latent measurement error variable 6 ty y. According 
to Equation (27) the indicators pertaining to a structurally dif- 
ferent non-reference method (e.g., parent reports) are measured 
by the latent reference state factor S t j\i (with an intercept aim 
and factor loading parameter Xsmi), a latent method factor M f;2 ; 
(with a factor loading parameter Xuijil)-, and a measurement 
error variable e f ,j2/. Finally, Equation (28) states that the indica- 
tors belonging to the interchangeable non-reference method (e.g., 
peer reports) are measured by the latent reference state factor S t ju 
(with a corresponding intercept ami an d factor loading parame- 
ter Xsij3lSt}ii), a latent common method factor CM t] n at Level 2 



and a latent unique method factor UM rt jn at Level 1 (with corre- 
sponding factor loading parameters XcMij3l a nd X\jMij3i), a s well 
as a measurement error variable e rtl} 3i. 

7. VARIANCE DECOMPOSITION 

Based on the definition of the LS-COM model each indicator's 
variance can be decomposed as follows: 

Var(Y tijn ) = X 2 SljlI Var(S tj u) + Var(e tij u), (29) 

Var(Y ti j 2 i) = kl^VariStju) + k 2 Mij2l Var(M t j 2 i) + Var(e t i j2 i), 

(30) 

Var{Y rti ju) = X 2 sm Var{Sm) + k 2 CMijJl Var(CM t fli) 

(31) 

+ x UM,j3l Var ( UM rtj3l) + Var(e rti j3i). 

Based on the above variance decomposition (see Equations 
29-31), it is possible to define different coefficients for quanti- 
fying convergent validity, method-specificity and reliability (see 
Table 1). In contrast to the CS-C(M-l) model, the LS-COM 
model allows calculating Level 2 and Level 1 variance coefficients, 
because it contains method factors at both Level 1 (UM rt j3i) and 
Level 2 (CMpi). 

In total, four different consistency coefficients [i.e., Con(S tl j 2 i), 
Con(S t ,j3i), Con(S rt ij3i), and the rater consistency coefficient 
RC(S rt jj3i)] can be defined. The Level 2 consistency coefficient 
Con(S tl j 2 i) for the indicators pertaining to the structurally dif- 
ferent non-reference methods represents the amount of true 
interindividual differences of the non-reference method (e.g., 
parent report) that can be explained by the reference method 
(self- report). The Level 1 consistency coefficient Con{S rt ipi) for 
the indicators pertaining to the interchangeable non-reference 
methods (e.g., peer reports) reflects the amount of true interindi- 
vidual differences of the individual peer reports that can be 
explained by the reference method (here: self-report). 

Sometimes researchers rather seek to know whether peers in 
general agree with the student self-reports. In such cases, they 
may calculate the Level 2 consistency coefficient Con(Snj3i) for 
the indicators pertaining to the set of interchangeable meth- 
ods. This consistency coefficient captures the amount of true 
interindividual differences of the expected peer ratings (the entire 
peer-group) that can be explained by the reference method (here: 
self-reports). Moreover, the true rater consistency coefficient 
RC(S rt ij3i) is defined as the proportion of true interindividual 
differences of the peer ratings that are free of measurement 
error and rater-specific effects. The rater consistency coefficient 
indicates how much true variance of a non-reference indica- 
tor is due to the overall amount of rater agreement (peers and 
self-ratings) and not due to measurement error influences or 
individual (rater-specific) influences. The true rater consistency 
coefficient can also be interpreted as true intra-class correla- 
tion. Moreover, three different method-specificity coefficients 
[i.e., MS(S ti j 2 i), CMS{S rt ij3i), and UMS{S rt ij3i)} can be analyzed. 
The method specificity coefficients MS(S tl j 2 i) indicate the degree 
or true variance of a non-reference method indicator pertaining 
to a structurally different method (e.g., parent reports) that is 



www.frontiersin.org 



April 2014 | Volume 5 | Article 311 | 7 



Koch et al. 



Longitudinal MTMM analysis 



Table 1 | Variance components of the non-reference method indicators in LS-COM model. 


Level 


Method 


Definition 




CONSISTENCY 






Level 2 


Struct, different 


r , c . 4 ij2 iVar(S tijv ) 
Co«S mi )- Var(YtiJ2l) _ Wet .. 2l) 


Level 2 


Interchangeable 


CoriS x %2i Var(s mt 
^ " 3I 4ij3l ^r(S m ,) + l 2 CMij3l Var(CM tj3l ) 


Level 1 


Interchangeable 


coms nii3l )- Var{Ymi) _ Var(€rmi) 


Level 1 


Interchangeable 


, Var[Y nil3 ,)- Varied 




Level 2 


Struct, different 


MS[S ^ Mm Var(M tj2l ) 
MS{S '^- Var(Y ml )-Var(e mi ) 


Level 2 


Interchangeable 


cm[s , 4/W,-3,^(CH,3/) 

rt ' 3 ' VarlY nij3 i) - Var(e rt ,y 3/ ) 


Level 1 


Interchangeable 


,,, C(C l W WUM *' 1 








Level 2 


Struct, different 




Level 1 


Interchangeable 





not determined by the reference method (e.g., self-report). The 
unique method specificity coefficient UMS(S rti jn) represents the 
proportion of true variance of a non-reference method indicator 
pertaining to the interchangeable methods that is neither shared 
with the self-reports nor with other peers. Hence, this coeffi- 
cient reflects the unique view of a particular rater on a particular 
occasion of measurement. The common method specificity coef- 
ficient CMS(S r tifli) reflects the proportion of true interindividual 
differences of the peer ratings that cannot be explained the refer- 
ence method (here: self- reports), but that is shared by other peers 
(Eid et al., 2008). Hence, this coefficient can also be interpreted 
as "rater consensus" with respect to the peer ratings that is not 
shared with the reference method. 

8. PERMISSIBLE CORRELATIONS 

Figure 4 shows a path diagram of a LS-COM model with three 
indicators per TMU, one construct, three methods and three 
occasions of measurement. As illustrated in the figure, the latent 
state factors can be correlated with each other (see Figure 4). 
Correlations between latent state factors pertaining to the same 
construct (e.g., empathy) and different occasions of measure- 
ment can be interpreted as indicators of construct stability. High 
positive correlations indicate that the construct is rather stable 
across time. Correlations between latent state factors pertaining 
to different constructs and the same occasion of measurement 
can be interpreted in terms of discriminant validity. High cor- 
relations indicate low discriminant validity at a given moment 
in time. Correlations between latent state factors pertaining to 
different constructs and different measurement occasions may 



be interpreted as coefficients of predictive validity. For exam- 
ple, students' self-reported level of empathy measured on the 
first occasion of measurement (Sun) may be indicative for the 
self-reported level of relational aggression measured on the sec- 
ond occasion of measurement (S f 2i2)- Moreover, correlations 
between the occasion-specific latent method factors pertaining 
to the same measurement level are permitted in the LS-COM 
model. 

The stability of method (rater) effects can be investigated 
by correlations between method factors pertaining to the same 
construct, same method, and different occasions of measure- 
ment. For example, correlations between the unique method 
factors UM rt j3i and UM rt ^i' (where I / I') indicate to what 
extent the individual rater-specific effects remain stable across 
time. Following a similar logic, the correlations between com- 
mon method factors CM t pi and CM^/ (where I ^ I') indi- 
cate to what extent the common rater effects (i.e., rater effects 
that are not shared with the self-report, but are shared with 
all other raters belonging to a particular target) remain sta- 
ble across time. The generalization of method effects across 
constructs is indicated by correlations between method factors 
pertaining to different constructs (e.g., empathy and relational 
aggression). For example, a negative correlation between the 
method factors pertaining to the peer reports assessing empathy 
and relational aggression would indicate that peers who tend to 
underestimate students' self-reported empathy level, tend to over- 
rate students' self-reported aggression level and vice versa. The 
generalization of method effects across different methods (rater 
types) is indicated by the correlation between method factors 
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FIGURE 4 | The LS-COM model with latent state factors and three 
indicators (/= 3), one construct (/= 1), two structurally different and 
one set of interchangeable methods (k — 3), and three occasions of 
measurement (/= 3), where r. rater and t. target. Y^: observed variables 
at Level 1 , Ytffkf- observed variables at Level 2, S t y«: latent state factors, Is/jki'- 
state factor loadings,M t y W : latent method factors, A. M y W : method factor 



loadings, CM^/: latent common method factors, XcMijki- common method 
factor loadings, fM rt;H : latent unique method factors, XuMijki'- unique method 
factor loadings, € m y«: measurement errors at Level 1, and e t ijki- measurement 
errors at Level 2. Intercepts (ap/) are not depicted in the figure. In this 
example, one of the structurally different methods (Method 1) serves as 
reference method. 



pertaining to different methods (e.g., parents and peers). For 
example, correlations between the two method factors M tJ 2i and 
CM t jn indicate whether peers and parents deviate in a similar 
ways (how a shared bias) from the self-reports on occasion of 
measurement /. 

8.1. MEAN CHANGE 

In addition to the investigation of the latent correlation as well as 
the variance components (provided in Table 1), many researchers 
seek to scrutinize the mean change of a particular construct across 



time. According to Equations (26-28), the expectation (mean) of 
the latent state factor can be identified as follows: 

E(Y ti ju) = E(a,ju) + E(k ti juStjii) + E(e ti ju), (32) 
= oiijU + ^HjuE(Stjii). (33) 

Given that e t yu is a zero-mean normally distributed residual 
variable, the latent mean of the latent state variables can be iden- 
tified by setting one intercept for each latent state factor to zero 
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[e.g., E(aiju) = 0] and the corresponding factor loading to one 
[i.e., Xgijii =1]. Another possibility is to set the intercepts and 
factor loadings equal across time (i.e., assuming strong mea- 
surement invariance; see below) and to set the latent mean of 
the first latent state factor to zero. Then, the latent means of 
the remaining latent state variables reflect the true mean change 
of construct j from occasion of measurement 1 to occasion of 
measurement /. 

8.2. TESTING MEASUREMENT INVARIANCE ACROSS TIME 

Measurement invariance (MI) plays an important role in lon- 
gitudinal analysis (Meredith, 1993; Widaman and Reise, 1997; 
Geiser et al., 2014). According to Widaman and Reise (1997) four 
different degrees of MI can be distinguished: ( 1 ) configural MI, 
(2) weak MI, (3) strong MI, and (4) strict MI. Configural MI 
implies that the number of factors as well as the factor structure 
as such is similar across different measurement occasions. In 
addition to configural MI, weak MI requires that the factor 
loading parameters for each indicator i are equal across different 
occasions of measurement. In addition to weak MI, strong 
MI assumes that the intercepts of indicators i are equal across 
different occasions of measurement. The most restrictive form 
of longitudinal MI (i.e., strict MI) implies that, in addition to 
the previous restrictions, the residual variances of indicators i are 
also equal across time. In the current work, we focus on one (not 
all possible) MI restriction that can be specified and empirically 
tested. In particular, we discuss the minimal set of restrictions 
that are necessary to meaningfully study mean change (i.e., strong 
MI) with respect to the reference method (e.g., self-reports). To 
meaningfully interpret mean change in the reference state factors, 
we recommend that the following MI restrictions be tested: 

otijil = WjiV = oiiji, where / ^ (34) 

^SijU = hijW = *sijl. where / ^ t. (35) 

The above restrictions (34) and (35) state that the intercept 
and factor loading parameters of the reference state factors are 
time-invariant. These assumptions imply that the scale on which 
the reference latent state factors are measured does not change 
across time. Hence, researchers who are interested in studying 
mean change as measured by the reference method (e.g., self- 
reports) should at least establish strong MI as proposed above (see 
Equations 34-35). LS-COM models implying different degrees of 
MI can be compared by using a x 2 -difference test. For calculating 
level-specific x 2 -difference tests see Ryu and West (2009). 

9. SIMULATION STUDY 

To investigate the performance of the LS-COM model proposed 
throughout the previous sections, a Monte Carlo (MC) simula- 
tion study was performed. The main purpose of the simulation 
was to examine the applicability of the LS-COM model across 
a range of conditions and to establish a set of guidelines and 
recommendations concerning sample size and model complexity 
that ensure consistent and unbiased estimation of parameters and 
their standard errors and minimize potential estimation problems 
(so called Heywood cases). 



9.1. RESULTS OF PREVIOUS SIMULATION STUDIES 

Numerous simulation studies have been performed in the past 
focusing on the applicability and robustness of the single-level 
(classical) SEMs (e.g., Boomsma, 1982; Gerbing and Anderson, 
1985; MacKinnon et al, 1995; Marsh et al., 1998; Fan et al, 
1999; Raykov, 2000; Enders and Bandalos, 2001; Jackson, 2001; 
Bandalos, 2002). So far, only few simulation studies have been 
carried out investigating complex multilevel structural equation 
models (e.g., Satorra and Muthen, 1995; Hox and Maas, 2001; 
Julian, 2001; Stapleton, 2002; Maas and Hox, 2005) or longitu- 
dinal CFA-MTMM models (Crayen, 2008; Geiser, 2009). In this 
section, we briefly summarize the results of previous simulation 
studies that are most relevant to the present study. 

With regard to single-level (classical) SEMs a ratio of 5 (some- 
times 10) observations per parameter has been suggested to 
ensure reliable parameter estimates and standard errors (Bentler 
and Chou, 1987; Bollen, 1989, 2002). With regard to multilevel 
(two level) SEMs, simulation studies indicate that the number of 
Level 2 units are more important than the number of Level 1 units 
suggesting that at least 100 Level 2 units be sampled for accu- 
rate standard error estimates and for detecting small effects on 
Level 2 (Hox and Maas, 2001; Maas and Hox, 2005; Meuleman 
and Billiet, 2009). It has also been found that ignoring the multi- 
level structure completely can lead to biased parameter estimates 
as well as their standard errors (Julian, 2001). Recent simulation 
studies favor the use of Bayesian estimation techniques showing 
that 20 Level 2 units can be sufficient for reliable parameter esti- 
mates when using weakly informative priors (Hox et al, 2012). 
Nevertheless for maximum likelihood estimation, it has been gen- 
erally recommended to sample at least 100 Level 2 units to ensure 
reliable parameter and standard error estimates (Hox and Maas, 
2001; Maas and Hox, 2005; Meuleman and Billiet, 2009). 

Simulation studies examining the statistical performance of 
longitudinal CFA-MTMM [i.e., CS-C(M-l)] models have shown 
that the parameter estimates and their standard errors are well- 
recovered in general. Nevertheless, the standard errors seem to 
be more sensitive to bias than the parameter estimates (Crayen, 
2008; Geiser, 2009). Moreover, the statistical performance of the 
CS-C(M-l) model increases with larger sample sizes (i.e., more 
empirical informations), fewer constructs and methods (i.e., less 
complex models) and with low convergent validity (i.e., increas- 
ing method bias) (Crayen, 2008; Geiser, 2009). 

Given that the CS-C(M-l) model by Geiser (2009) is a single- 
level confirmatory factor model, it is not clear to which extent the 
results described above apply to the LS-COM model. Similarly, 
the results of the simulation studies examining the performance 
of multilevel structural equation models (ML-SEM) may also not 
apply to the LS-COM model, given that the models used in those 
simulation studies are usually less complex (including only a few 
latent factors and no complex MTMM structure). 

9.2. DESIGN OF THE SIMULATION STUDY 

To investigate the effect of model complexity and sample size 
on estimation problems and precision it was necessary to vary a 
number of potentially influential factors. Because the LS-COM 
model is a longitudinal multilevel CFA-MTMM model, three 
main factors influence model complexity. To allow distinguishing 
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their influences (a) the number of constructs (1 vs. 2), (b) the 
number of methods (2 vs. 3), and (c) the number of occasions of 
measurement (2, 3, and 4) were varied independently. 

In addition to these sources of model complexity, real-life 
applications of MTMM analysis vary greatly in the degree of 
convergent validity between the employed methods. To inves- 
tigate whether convergent validity has an effect on the qual- 
ity of the estimation this factor was also varied in two levels 
(high vs. low convergent validity). We used the coefficients of 
consistency, method specificity, and reliability to specify the 
true (population) model parameters. The degree of consistency 
and method specificity were allowed to differ across MC con- 
ditions, implying a condition of high consistency (i.e., high 
convergent validity) and a condition of low consistency (i.e., 
low convergent validity). The reliability of each indicator was 
obtained by the sum of the consistency and method specificity 
coefficients (range: 0.775-0.825). Table 2 shows the population 
values for the different variance components for the different 
indicators. 

Due to the multilevel structure of the LS-COM model sample 
size can be varied on the level of targets (Level 2) as well as on 
the level of the interchangeable raters (Level 1). As with model 
complexity, these two factors were varied independently of each 
other. The number of Level 2 units was set at 100, 250, and 500 
targets (Level 2), while the number Level 1 units was set at 2, 5, 
10, and 20 raters per target. 

In total this simulation design resulted in2x2x2x3x4x 
3 = 288 possible conditions. Of these 288 only 232 were included, 
because the remaining 56 conditions represented constellations 
in which the model is underidentified due to there being fewer 
targets than free model parameters. Of these 56 conditions 50 
were conditions with only 100 Level 2 units and all but 8 were 
conditions represented models with 2 constructs. 

Overall, 116,000 (232 x 500) data sets with a varying num- 
ber or observations (200-10,000) were simulated using Mplus 
6.1 (Muthen and Muthen, 2010), the free software R 2.14.0 
(R Core Team, 2014), as well as various R packages such 
as MplusAutomation (Hallquist, 2011), OpenMx (Boker 
et al, 2011), and corcounts (Erhardt, 2013). All files of 
this simulation study can be downloaded from the following 



Table 2 | Consistency, method specificity and reliability of the 
LS-COM model. 



Low consistency High consistency 





Mean 


SD 


Mean 


SD 


Consistency 


0.30 


(±0.025) 


0.60 


(±0.025) 


Unique method specificity 


0.25 


(±0.025) 


0.10 


(±0.025) 


Common method specificity 


0.25 


(±0.025) 


0.10 


(±0.025) 


Method specificity 


0.50 


(±0.050) 


0.20 


(±0.050) 


Reliability 


0.80 


(±0.025) 


0.80 


(±0.025) 



The variance coefficients above were standardized with regard to the observed 
variance of an indicator. Values in parentheses indicate the variation in stan- 
dard deviations of the consistency and method specificity values across different 
indicators. 



website 3 . An example Mplus syntax for the simulations study is 
provided in Appendix B in Supplementary Material. 

Strong MI was assumed in all models (c.f. Widaman and Reise, 
1997). All models were estimated using the maximum likelihood 
estimator implemented in Mplus assuming multivariate normally 
distributed and complete data. 

9.3. EVALUATION CRITERIA 

The performance of the LS-COM model was examined using the 
following criteria: (a) rate of non-convergence after a maximum 
of 1000 iterations, (b) rate of improper solutions 4 (i.e., Heywood 
cases) due to non-positive definite covariance matrices * and 0, 
(c) the amount of parameter estimation as well as standard error 
bias, and (d) the accuracy of the x 2 -model fit statistics. 

The absolute parameter bias was first calculated for each 
parameter p and then aggregated across all parameters of the same 
parameter type c for which effects were presumed to be equal 
(e.g., all common method factor loadings, XcMiftV, c.f. Bandalos, 
2006): 

„K„_±f;f!ffez4dY (3 6 , 

n c c = j V e pc ) 

Mp c is the average of the MC parameter estimates across all 500 
replications for parameter p of parameter type c, whereas e pc is 
the true population value of that parameter. n c is the number of 
parameters in cluster c. 

In a similar way, the absolute standard error bias was 
calculated: 




Mse pc is the average standard error of parameter p allotted to 
parameter type c across all 500 MC replications, whereas SDp c is 
the standard deviation of the parameter estimate for parameter p 
in cluster c across all 500 MC replications. 

The aggregation of the absolute parameter estimation and 
standard error biases was done for two reasons. First, the LS- 
COM model incorporates many free parameters to be estimated 
(sometimes more than 100) and it would not be feasible to report 
bias for each single model parameter. Second, it is reasonable 
to assume that similar parameters (e.g., all measurement error 
variances) are biased in a similar way. Hence, by aggregating 
parameters that belong to the same parameter type, it was possible 
to investigate general bias in parameter estimates and their stan- 
dard errors. In total 12 types of parameters were defined. Eight of 
these stemmed from the between part of the model: (1) the state 
factor loadings (2) the common method factor loadings 

(^Cm)> (3) the method factor loadings (Ajj), (4) the covariances 
of latent variables on Level 2 {covli), (5) the latent means (/x), (6) 
the latent intercepts (a), (7) the variance of the latent variables at 
Level 2 (var^), and (8) the Level 2 residual variances (en)- The 



3 http://www.ewi-psy.fu-berlin.de/einrichtungen/arbeitsbereiche/psymeth/ 
mitarbeiter/ tkoch/index.html. 

4 * -warning messages indicate linear dependencies in the covariance matrix 
of the latent variables, whereas 0-warning messages indicate estimation prob- 
lems with regard to the latent error variables (e.g., negative error variances). 
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remaining four parameter clusters all pertained to parameters at 
Level 1: (9) the unique method factor loadings (\um)> (10) the 
unique method factor variances (varn), (11) the covariances of 
the unique method factors (covn), and (12) the Level 1 residual 
variances (en). 

In line with previous MC simulation studies investigating 
MTMM-SEMs (e.g., Nussbeck et al, 2006; Geiser, 2009) 0.10 was 
chosen as a cut-off criterion for both parameter and SE biases, and 
absolute values beyond this threshold were deemed unacceptable. 

10. RESULTS 

10.1. RATE OF NON-CONVERGENCE 

All 116,000 specified LS-COM (HO) models converged properly 
within 1000 iterations. 

10.2. RATE OF IMPROPER SOLUTIONS 

Mplus warning messages regarding potential * -problems were 
encountered in 65 out of 232 (28.02%) MC conditions, but 
in only 2,366 out of 116,000 (2.04%) total replications in the 
simulation. The main reason for the VP -warning messages were 
linear dependencies in the latent covariance matrix due to higher 
order partial correlations above |l|. Moreover, only 2 out of 
232 MC conditions contained improper solutions with regard to 
latent residual matrix 8. Hence, the actual amount of improper 
solutions with regard to this simulation study was below 5%. 

Most of the conditions exhibiting general warning messages 
were high consistency conditions (i.e., 56 MC conditions and 
2,306 out of 116,000 replications, 1.99%) and only few were 
low consistency conditions (i.e., 9 MC conditions and 60 out 
of 116,000 replications, 0.05%) Moreover, the frequency of 
warning messages decreased with increasing sample size on Level 
1 (number of raters per target) as well as with increasing sample 
size on Level 2 (number of targets). Figure 5 shows the relation- 
ship between the average amount of * -warning messages and the 
sample size on both levels in the low and the high consistency 
conditions. Figure 5 shows that the amount of * -warning mes- 
sages decreased substantially with the number of targets as well 
as with the number of raters per target. Figure 5 also indicates 



that the number of raters per target might be more important for 
the reduction of * -warning messages than the number of Level 2 
units (here: targets). 

10.3. AMOUNT OF PARAMETER AND STANDARD ERROR BIAS 

Across all 232 conditions the absolute parameter estimation bias 
(peb, see Equation 36) was below the cutoff value of 10%. 
However, the absolute standard error bias (seb, see Equation 37) 
exceeded the value of 10% in 2 1 out of 232 MC conditions. Higher 
seb values were more often found in the high consistency (14 out 
of 21 conditions, 66.67%) conditions than in the low consistency 
(7 out of 21 conditions, 33.33%) conditions. Figure 6 shows the 
average peb and seb values across all parameters with respect to 
the sample size on Level 1 and Level 2 as well as with respect to 
the consistency condition (high vs low). 

Figure 6 shows that the average peb and seb values decreased 
with increasing sample size on Level 1 and Level 2. In particular, 
the sample size on Level 1 (number of raters per target) seemed 
to be crucial for the reduction of the seb. Moreover, Figure 6 
shows that the average amount of peb and seb was lower in the 
low consistency condition than in the high consistency condition. 
Note that the average peb and seb (i.e., across all parameters) 
were below 10% (see Figure 6). Further investigations revealed 
that specific LS-COM model parameters were more sensitive to 
bias than others. Specifically, the common method factor loadings 
^cm> method factor loadings Xm, unique method factor load- 
ings Xum, as well as the variances of unique method factors varn 
showed the largest standard error biases. Additionally, the seb of 
the latent means on Level 2 exceeded the cutoff value of 10% 
in one single MC condition (i.e., one construct, two methods, 
two occasions of measurement, 10 Level 1 units and 100 Level 
2 units). Figure 7 shows the dependency of the seb values on the 
sample size at each of the measurement levels in the high and low 
consistency condition. 

According to Figure 7, the standard error bias was substan- 
tially reduced with increasing sample size on both levels. In 
particular, the standard error bias dropped below the cutoff value 
of 10% when more than 2 raters per target were sampled. 
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FIGURE 5 | Average number of * -warning messages in high and low consistency conditions. n L i = number of Level 1 units; n L2 = number of Level 2 
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FIGURE 6 | Average peb and seb values with respect to sample size in high and low consistency conditions in the LS-COM model. n L = number of 
Level 1 units; n^2 = number of Level 2 units. 



10.4. x 2 -FIT-STATISTICS 

In Figure 8A,B the simulated and expected proportions of the / 2 
values for monoconstruct and multiconstruct LS-COM models 
are presented. According to these results, the simulated x 2 -values 
were always below the theoretically expected x 2 -values indicating 
a downward bias in the asymptotic type I error. These results sug- 
gest that too many specified LS-COM models would be accepted 
with respect to a nominal alpha level of 0.05 if researchers used 
the theoretical x 2 distribution to test the model fit. Hence, the x 2 
model fit test appeared to be too liberal with respect to LS-COM 
models under the conditions studied here. However, the differ- 
ences between the observed and the expected x 2 -distributions 
at a nominal alpha level of 5% were relatively small (on average 
0.03 for monoconstruct condition and 0.04 for the multiconstruct 
condition). The results also indicate that the x 2 model fit test was 
more accurate for less complex (i.e., monoconstruct) LS-COM 
models. We did not find a straightforward relationship between 
sample size and the accuracy of the x 2 model fit test for the 
LS-COM model. 

11. DISCUSSION 

In the present work a multilevel longitudinal CFA-MTMM model 
for the combination of structurally different and interchangeable 



methods (called LS-COM model) was proposed. The LS-COM 
model combines the advantages of multilevel, longitudinal, and 
CFA-MTMM modeling approaches and is suitable for MTMM 
measurement designs combining different types of methods. 
Given that such complex MTMM measurement designs are 
increasingly used in psychology (e.g., 360° feedback designs, 
multisource, mutirater designs), the LS-COM fills a gap in the 
current literature on longitudinal MTMM modeling. Previous 
studies on longitudinal MTMM modeling have either focused 
exclusively on single-indicator models or on a specific type of 
method (e.g., structurally different methods) (e.g., Kenny and 
Zautra, 2001; Burns and Haynes, 2006; Courvoisier et al., 2008; 
Grimm et al, 2009; Geiser et al, 2010). In the present arti- 
cle a new CFA-MTMM model has been developed allowing the 
simultaneous analysis of different types of methods (i.e., struc- 
turally different and interchangeable methods) across time using 
a multiple indicator, multilevel latent variable approach. The LS- 
COM model overcomes many limitations of previous models by 
allowing researchers to 

1. study method effects on different levels (rater and target level), 

2. analyze the stability and change of construct and method 
effects across time, 
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3. evaluate the convergent and discriminant validity among dif- 
ferent methods across time, 

4. investigate the stability and change of a given construct 
(attribute) across time, 

5. examine different variance coefficients and the psychome- 
tric properties of the measures on multiple occasions of 
measurement, 

6. test important assumptions (e.g., measurement invariance), 
and 

7. study potential causes of method effects by including external 
variables in the model. 

Moreover, the LS-COM model is defined based on the stochas- 
tic measurement theory (Suppes and Zinnes, 1963; Zimmerman, 
1975; Steyer and Eid, 2001), which bears the advantage of defining 
the latent variables as random variables with a clear psychome- 
tric interpretation. That means that the latent variables in the 
LS-COM model are not simply assumed, but properly defined 
as random variables on the probability space (see Appendix A in 
Supplementary Material for the formal definitions). In addition, 
the LS-COM makes use of a latent regression modeling [CT-C(M- 
1)] approach, which allows contrasting different methods against 
a reference method. The CT-C(M-l) modeling approach bears 
the advantages of using "pure" method factors by defining the 
method variables as latent residual variables (see Geiser et al., 
2008, for more details). In addition, the CT-C(M-l) modeling 
approach allows separating the total variance of each indica- 
tor into state, method, and measurement error components and 
calculating different variance coefficients (e.g., coefficients of con- 
sistency, method specificity, reliability), which is not possible in 
other MTMM modeling approaches [e.g., latent means (Pohl and 
Steyer, 2010) and latent difference modeling (Pohl et al., 2008) 
approaches]. 



Researchers who are interested in studying the mean change 
of an attribute across time should first test the degree of mea- 
surement invariance and then estimate the latent means of the 
latent state factors as described above. In addition, the stabil- 
ity and change of the interindividual differences in an attribute 
can be investigated by the correlations of the latent state factors 
pertaining to different occasions of measurement. The stability 
and change of the method factors across time can be studied 
with regard to the correlations between the latent method fac- 
tors measured on different occasions of measurement. In total, 
three different types of method effects can be examined. First, 
the method effects of the structurally different method (e.g., par- 
ent reports) that is not shared with the reference method (e.g., 
self-reports). Second, the common method effect of the inter- 
changeable methods (e.g., general peer rating) that is not shared 
with the reference method (e.g., self- report). Third, the unique 
method effect of the interchangeable methods (e.g., single peer 
rating) that is neither shared with the reference method (e.g., 
self-reports), nor with other peers. A meaningful interpretation 
of correlation coefficients between method factors across time 
(e.g., as stability of method effects), typically requires that the 
same raters are recruited at each time point. The generalizabil- 
ity of the method effects can be examined by the correlations of 
latent method factors pertaining to different types of methods 
(structurally different and interchangeable methods). 

In order to examine the trustworthiness of the parameter 
and standard error estimates in the LS-COM model, we con- 
ducted a MC simulation study. To our knowledge, no simulation 
study has been performed so far scrutinizing the statistical per- 
formance of complex longitudinal, multilevel, multiple indicator 
CFA-MTMM models. 

According to the results of our MC simulation study, the 
LS-COM model can produce reliable parameter estimates even 
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in small samples with just 100 targets and 2 raters per targets. 
However, for such small samples the standard errors of LS-COM 
model parameters will be marginally biased. Most sensitive to bias 
are the standard errors of the method factor loading parameters 
(i.e., XuMijkh ^CMijkh ^Mijkl) as well as the standard errors of the 
unique method factor variance [i.e., Var(UM rt jki)]. The standard 
error bias can be reduced by increasing the number of Level 1 
units (i.e., number of raters per target). In cases with at least 5 
raters per target and 100 targets, the LS-COM produced unbiased 
parameters as well as standard errors in our simulation. In gen- 
eral, parameter estimates seemed more accurate in cases with low 
convergent validity. Low convergent validity is often seen in prac- 
tice (e.g., Eid et al, 2003, 2008; Carretero-Dios et al, 2011; Pham 
et al, 2012), so that the LS-COM model should generally result in 
unbiased parameter and SE estimates. 

The number of methods as well as the number of occasions of 
measurement did not seem to affect the accuracy of the parame- 
ter estimation or their standard errors. If at all, more occasions of 
measurement proved beneficial for the stability of the parameter 
estimation. This is most likely due to the fact that strong MI was 
assumed for the repeated measures in the simulation. Because of 
this, the ratio of available information to free parameters actu- 
ally increased with more measurement occasions. It should be 
noted, however, that this condition might not be present in appli- 
cations in which the assumption of strong MI does not hold or 
the number of occasions is very large. 

In contrast to the number measurement occasions an increas- 
ing number of constructs generally does make the LS-COM 
model more complex, because invariance assumptions are gen- 
erally not imposed across different constructs. In cases with many 
constructs, we recommend splitting the complete LS-COM model 
into multiple submodels and analyzing all combinations using 
two constructs simultaneously. All coefficients of interest (e.g., 
correlations) can still be estimated without affecting the meaning 
of any parameter in the model. A prerequisite for the step-by-step 
procedure is that the same reference method is chosen. 

The results of this simulation study support previous find- 
ings of classical SEM (see Bentler and Chou, 1987; Bollen, 1989, 
2002). Based on a simulation study, Bentler and Chou (1987) sug- 
gested that a ratio of 5:1 (observations per parameter) is sufficient 
for proper parameter estimates with regard to classical structural 
equation models. The results of our simulation study support 
this conclusion for LS-COM models. We therefore recommend 
sampling at least 5 raters per target and at least as many tar- 
gets as there are free parameters to be estimated. Our simulation 
study also revealed new insights into complex multilevel SEM, 
namely that the sample size on Level 1 is an important factor that 
influences the quality of model estimation. Previous simulation 
studies devoted to this research area claimed that the number of 
Level 1 units is less important than the number of Level 2 units 
(Maas and Hox, 2005). Our results show that the number of Level 
1 units can be crucial for the reduction of standard error bias in 
complex multilevel structural equation models. 

So far, only few studies have investigated the accuracy of x 2 -fit- 
statistics in complex ML-SEMs (Ryu and West, 2009; Ryu, 2014; 
Schermelleh-Engel et al., 2014). The results of our simulation 



study are generally encouraging as they indicated that the over- 
all x 2 -test of exact fit was only marginally biased with regard to 
a nominal alpha level of 5% and multivariate normal distributed 
and complete data. More specifically, our results indicate that the 
overall maximum likelihood x 2 -test of exact fit may be slightly 
too liberal for complex ML-SEM models. However, we recom- 
mended to use robust maximum likelihood estimation (MLR) 
when multivariate normality cannot be assumed. 

Future studies should focus on three issues associated with 
complex longitudinal multilevel MTMM modeling. First, the sta- 
tistical effects of attrition (i.e., missingness) of the interchangeable 
raters across time and the possibilities of alternative modeling 
approaches should be investigated. Second, the robustness of x 2 
fit statistics in complex multilevel SEM with non-normal and 
(un)complete data should be examined and alternative fit statis- 
tics for complex multilevel SEMs should be scrutinized. With 
respect to the investigation of fit statistics in multilevel SEM, 
researchers maybe inspired by the recent work of Schermelleh- 
Engel et al. (2014) and Ryu (2014). Third, future studies should 
focus on possible extensions of the LS-COM model to the other 
longitudinal modeling approaches [e.g., latent state-trait models, 
latent difference (change) models, latent growth curve models] 
with one or more sets of interchangeable methods and apply these 
models to real data. 

12. CONCLUSION AND GENERAL RECOMMENDATION 

In this work, we presented a new longitudinal multilevel CFA- 
MTMM model for the combination of structurally different 
and interchangeable methods. The model extends the spectrum 
of longitudinal MTMM modeling approaches by allowing the 
simultaneous investigation of method effects on different mea- 
surement levels across time. With respect to the results of the 
simulation study, we recommend that researchers should sample 
at least as many Level 2 units (i.e., targets) as there are free param- 
eters to be estimated in the model and at least 5 interchangeable 
raters per target in order to obtain a reliable sample size for proper 
parameter standard error estimation. Moreover, we suggest that 
researchers should test the degree of MI when studying mean 
change of a given attribute across time. 
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